Static Approximation of MPI Communication Graphs for Optimized Process Placement
نویسندگان
چکیده
Message Passing Interface (MPI) is the de facto standard for programming large scale parallel programs. Static understanding of MPI programs informs optimizations including process placement and communication/computation overlap, and debugging. In this paper, we present a fully context and flow sensitive, interprocedural, best-e↵ort analysis framework to statically analyze MPI programs. We instantiate this to determine an approximation of the point-to-point communication graph of an MPI program. Our analysis is the first pragmatic approach to realizing the full point-to-point communication graph without profiling – indeed our experiments show that we are able to resolve and understand 100% of the relevant MPI call sites across the NAS Parallel Benchmarks. In all but one case, this only requires specifying the number of processes. To demonstrate an application, we use the analysis to determine process placement on a Chip MultiProcessor (CMP) based cluster. The use of a CMP-based cluster creates a two-tier system, where inter-node communication can be subject to greater latencies than intra-node communication. Intelligent process placement can therefore have a significant impact on the execution time. Using the 64 process versions of the benchmarks, and our analysis, we see an average of 28% (7%) improvement in communication localization over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement.
منابع مشابه
Process Mapping for MPI Collective Communications
It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost in modern parallel computers. Existing work uses profile-guided approaches to optimize mapping schemes to minimize the cost of point-to-point communications automatically. However, these approaches cannot deal with c...
متن کاملCurrent State of the Cray MPT Software Stacks on the Cray XC Series Supercomputers
HPC applications heavily rely on Message Passing Interface (MPI) and SHMEM programming models to develop distributed memory parallel applications. This paper describes a set of new features and optimizations that have been introduced in Cray MPT software libraries to optimize the performance of scientific parallel applications on modern Cray XC series supercomputers. For Cray XC systems based o...
متن کاملComplexity and approximation ratio of semitotal domination in graphs
A set $S subseteq V(G)$ is a semitotal dominating set of a graph $G$ if it is a dominating set of $G$ andevery vertex in $S$ is within distance 2 of another vertex of $S$. Thesemitotal domination number $gamma_{t2}(G)$ is the minimumcardinality of a semitotal dominating set of $G$.We show that the semitotal domination problem isAPX-complete for bounded-degree graphs, and the semitotal dominatio...
متن کاملTowards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
This paper presents a method to efficiently place MPI processes on multicore machines. Since MPI implementations often feature efficient supports for both shared-memory and network communication, an adequate placement policy is a crucial step to improve applications performance. As a case study, we show the results obtained for several NAS computing kernels and explain how the policy influences...
متن کاملDistributed Data Placement via Graph Partitioning
With the widespread use of shared-nothing clusters of servers, there has been a proliferation of distributed object stores that offer high availability, reliability and enhanced performance for MapReducestyle workloads. However, relational workloads cannot always be evaluated efficiently using MapReduce without extensive data migrations, which cause network congestion and reduced query throughp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014